T-code compression for Arabic computational morphology

نویسندگان

  • Jim Yaghi
  • Mark R. Titchener
  • Sane Yagi
چکیده

It is impossible to perform root-based searching, concordancing, and grammar checking in Arabic without a method to match words with roots and vice versa. A comprehensive word list is essential for incremental searching, predictive SMS messaging, and spell checking, but due to the derivational and inflectional nature of Arabic, a comprehensive word list is taxing on storage space and access speed. This paper describes a method for compactly storing and efficiently accessing an extensive dictionary of Arabic words by their morphological properties and roots. Compression of the dictionary is based on T-Code encoding, which follows the Huffman encoding model. The special characteristics inherent in the recursive augmentation method with which codes are created allow compact storage on disk and in memory. They also facilitate the efficient use of bandwidth, for Arabic text transmission, over intranets and the Internet.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of a compression system dynamic simulation code for testing and designing of anti-surge control system

In recent years, several research activities have been conducted to develop knowledge in analysis, design and optimization of compressor anti-surge control system. Since the anti-surge control testing on a full-scale compressor is limited to possible consequences of failure, and also the experimental facility can be expensive to set up control strategies and logic, design process often involves...

متن کامل

Arabic-document compression: A close look at group 3 international digital facsimile coding standards

Efficient bit-representation or compression of documents is an important issue in many applications. The amount of compression depends on the document contents such as written scripts, diagrams, tables, etc. The contents of the document determine the limit of this compression. In the CCITI" Recommendation T.4, 'Standardization of group 3 apparatus for document transmission', a modified Huffman ...

متن کامل

Extending the Radar Dynamic Range using Adaptive Pulse Compression

The matched filter in the radar receiver is only adapted to the transmitted signal version and its output will be wasted due to non-matching with the received signal from the environment. The sidelobes amplitude of the matched filter output in pulse compression radars are dependent on the transmitted coded waveforms that extended as much as the length of the code on both sides of the target loc...

متن کامل

Fast Intra Mode Decision for Depth Map coding in 3D-HEVC Standard

three dimensional- high efficiency video coding (3D-HEVC) is the expanded version of the latest video compression standard, namely high efficiency video coding (HEVC), which is used to compress 3D videos. 3D videos include texture video and depth map. Since the statistical characteristics of depth maps are different from those of texture videos, new tools have been added to the HEVC standard fo...

متن کامل

Conventional Orthography for Dialectal Arabic

Dialectal Arabic (DA) refers to the day-to-day vernaculars spoken in the Arab world. DA lives side-by-side with the official language, Modern Standard Arabic (MSA). DA differs from MSA on all levels of linguistic representation, from phonology and morphology to lexicon and syntax. Unlike MSA, DA has no standard orthography since there are no Arabic dialect academies, nor is there a large edited...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003